This presentation is open source. All the code is executable. To run locally, clone
git@github.com:rustyrazorblade/python-presentation.git
and then
pip install -r requirements.txt
Full permission (and encouragement!) granted to use this material for your own presentations.
By Jon Haddad
This is a quick refresher into Cassandra data modeling. We're going to go over keyspace and table creation first, then dig into how CQL translates to table layout.
We're going to use the ipython-cql extension extension which lets us run CQL queries directly from IPython notebook. It's available on pypi as ipython-cql. Any query can be executed. A one line query can be executed as:
%cql select * from table
And multiline queries require a double %% like so:
%%cql select *
from table
These queries can be executed in cqlsh, just remove the %cql part.
A running Cassandra version > 2.0 required.
To execute a cell in this notebook, simple shift-enter. Cells can be edited and reexecuted as well.
In [ ]:
%load_ext cql
In [ ]:
%cql DROP KEYSPACE if exists tutorial;
%cql CREATE KEYSPACE tutorial WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
In [ ]:
%keyspace tutorial
In [ ]:
%tables
Here's a few tables that define users and photos:
In [ ]:
%%cql
CREATE TABLE IF NOT EXISTS user (
user_id uuid,
name text,
PRIMARY KEY (user_id)
)
In [ ]:
%%cql
CREATE TABLE IF NOT EXISTS photo (
photo_id uuid,
name text,
PRIMARY KEY (photo_id)
);
In [ ]:
%tables
In [ ]:
%desc photo
In [ ]:
%cql INSERT INTO user (user_id, name) VALUES (1d5dfcbe-57a6-4b23-906c-3e48db617f49, 'Jon')
%cql INSERT INTO user (user_id, name) VALUES (909c59a7-57ac-424d-97b8-a096800ab037, 'Steve')
print "Users:"
%cql SELECT * from user;
In [ ]:
%cql INSERT INTO photo (photo_id, name) VALUES (7dbf7758-e46c-4175-8f03-bf05d2c220d9, 'Handsome Cat')
print "Photos:"
%cql SELECT * from photo;
Within a partition there can be many rows. The rows within a partition are sorted by the clustering key. For example, lets create a likes table. We want to keep track of which users have liked which photo, and we want to be able to query that effeciently. We know our query is always going to ask for "who liked this photo?". So we want all the likes for a particular photo to be stored together.
In [ ]:
%%cql
CREATE TABLE IF NOT EXISTS like (
photo_id uuid,
user_id uuid,
primary key (photo_id, user_id)
);
In [ ]:
%tables
A primary key is written as (partition_key, clustering_key...)
When SSTAbles are written to disk, likes will be partitioned by photo_id, and sorted by user_id (the clustering key)
This makes queries predictable even for tens of thousands of rows by minimizing disk seeks.
In [ ]:
%%cql
CREATE TABLE users (
id text PRIMARY KEY,
given text,
surname text,
favs map<text, text> // A map of text keys, and text values
)
In [ ]:
%%cql
INSERT INTO users (id, given, surname, favs)
VALUES ('jsmith', 'John', 'Smith', { 'fruit' : 'apple', 'band' : 'Beatles' })
In [ ]:
%cql select * from users;
In [ ]:
%%cql
CREATE TABLE images (
name text PRIMARY KEY,
owner text,
date timestamp,
tags set<text>
);
In [ ]:
%%cql
INSERT INTO images (name, owner, date, tags)
VALUES ('cat.jpg', 'jsmith', 'now', { 'kitten', 'cat', 'pet' });
In [ ]:
%cql select * from images;
In [ ]:
%%cql
UPDATE images SET tags = tags + { 'cute', 'cuddly' } WHERE name = 'cat.jpg';
In [ ]:
%cql select * from images
In [ ]:
%%cql
BEGIN BATCH
INSERT INTO users (id, given, surname) VALUES ('jhaddad', 'Jon', 'Haddad');
INSERT INTO users (id, given, surname) VALUES ('ltran', 'Lina', 'Tran');
APPLY BATCH
In [ ]:
%cql SELECT * from users;
In [ ]:
%cql INSERT INTO users (id, given, surname) VALUES ('ltillman', 'Luke', 'Tillman') USING TTL 1;
%cql SELECT * from users WHERE id = 'ltillman';
In [ ]:
%cql SELECT * from users WHERE id = 'ltillman';
Inserts do not do a check if the row already exists. This can lead to race conditions in your code. If you have a situation where this absolutely cannot happen, you can use a lightweight transaction. For example:
In [ ]:
%cql INSERT INTO users (id, given, surname) values ('psmith', 'Pete', 'Smith');
%cql INSERT INTO users (id, given, surname) values ('psmith', 'Paul', 'Smith');
%cql select * from users where id = 'psmith'; # overwrote the original!
Let's look at the original user account:
In [ ]:
%cql SELECT * from users where id = 'jhaddad';
Now, lets try to overwrite it with an INSERT using a lightweight transaction:
In [ ]:
%cql INSERT INTO users (id, given, surname) values ('jhaddad', 'Joseph', 'Haddad') IF NOT EXISTS
We can see the original record comes back if the transaction fails. We can verify the db has not been changed:
In [ ]:
%cql SELECT * from users where id = 'jhaddad';